The Nobel Prize is five separate prizes that, according to Alfred Nobel's will of 1895, are awarded to ”those who, during the preceding year, have conferred the greatest benefit to humankind.”
Nobel Prizes are awarded in the fields of Physics, Chemistry, Physiology or Medicine, Literature, and Peace.
Let's see what patterns we can find in the data of the past Nobel laureates. What can we learn about the Nobel prize and our world more generally?
The dataset is available here
import pandas as pd
import numpy as np
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import matplotlib.lines as mlines
pd.options.display.float_format = '{:,.2f}'.format
df_data = pd.read_csv('nobel_prize_data.csv')
df_data.shape
(962, 16)
df_data.columns
Index(['year', 'category', 'prize', 'motivation', 'prize_share',
'laureate_type', 'full_name', 'birth_date', 'birth_city',
'birth_country', 'birth_country_current', 'sex', 'organization_name',
'organization_city', 'organization_country', 'ISO'],
dtype='object')
df_data.sample(5)
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 48 | 1908 | Peace | The Nobel Peace Prize 1908 | NaN | 1/2 | Individual | Klas Pontus Arnoldson | 1844-10-27 | Gothenburg | Sweden | Sweden | Male | NaN | NaN | NaN | SWE |
| 666 | 1996 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for their fundamental contributions to the ec... | 1/2 | Individual | William Vickrey | 1914-06-21 | Victoria, BC | Canada | Canada | Male | Columbia University | New York, NY | United States of America | CAN |
| 510 | 1980 | Medicine | The Nobel Prize in Physiology or Medicine 1980 | "for their discoveries concerning genetically ... | 1/3 | Individual | Baruj Benacerraf | 1920-10-29 | Caracas | Venezuela | Venezuela | Male | Harvard Medical School | Boston, MA | United States of America | VEN |
| 542 | 1983 | Physics | The Nobel Prize in Physics 1983 | "for his theoretical and experimental studies ... | 1/2 | Individual | William Alfred Fowler | 1911-08-09 | Pittsburgh, PA | United States of America | United States of America | Male | California Institute of Technology (Caltech) | Pasadena, CA | United States of America | USA |
| 47 | 1908 | Peace | The Nobel Peace Prize 1908 | NaN | 1/2 | Individual | Fredrik Bajer | 1837-04-21 | Næstved | Denmark | Denmark | Male | NaN | NaN | NaN | DNK |
df_data.duplicated().sum()
0
df_data.isna().sum()
year 0 category 0 prize 0 motivation 88 prize_share 0 laureate_type 0 full_name 0 birth_date 28 birth_city 31 birth_country 28 birth_country_current 28 sex 28 organization_name 255 organization_city 255 organization_country 254 ISO 28 dtype: int64
df_data["pct_share"] = df_data.prize_share
df_data["pct_share"].replace("1/1",100,inplace=True)
df_data["pct_share"].replace("1/2",50,inplace=True)
df_data["pct_share"].replace("1/3",33.33,inplace=True)
df_data["pct_share"].replace("1/4",25,inplace=True)
df_data.birth_date = pd.to_datetime(df_data.birth_date)
df_data.dtypes
year int64 category object prize object motivation object prize_share object laureate_type object full_name object birth_date datetime64[ns] birth_city object birth_country object birth_country_current object sex object organization_name object organization_city object organization_country object ISO object pct_share float64 dtype: object
df_data.sample(3)
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | pct_share | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 641 | 1994 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for their pioneering analysis of equilibria i... | 1/3 | Individual | Reinhard Selten | 1930-10-05 | Breslau (Wroclaw) | Germany (Poland) | Poland | Male | Rheinische Friedrich-Wilhelms-Universität | Bonn | Germany | POL | 33.33 |
| 803 | 2007 | Physics | The Nobel Prize in Physics 2007 | "for the discovery of Giant Magnetoresistance" | 1/2 | Individual | Peter Grünberg | 1939-05-18 | Plzen | Czechoslovakia (Czech Republic) | Czech Republic | Male | Forschungszentrum Jülich | Jülich | Germany | CZE | 50.00 |
| 645 | 1994 | Peace | The Nobel Peace Prize 1994 | "for their efforts to create peace in the Midd... | 1/3 | Individual | Shimon Peres | 1923-08-16 | Vishneva | Poland (Belarus) | Belarus | Male | NaN | NaN | NaN | BLR | 33.33 |
sex_pct = df_data.sex.value_counts()
sex_pct
Male 876 Female 58 Name: sex, dtype: int64
px.pie(names=sex_pct.index,
values=sex_pct,
hole=0.5,
title = "Percentage of Male vs. Female Laureates")
df_data[df_data.sex == "Female"].head(3)
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | pct_share | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18 | 1903 | Physics | The Nobel Prize in Physics 1903 | "in recognition of the extraordinary services ... | 1/4 | Individual | Marie Curie, née Sklodowska | 1867-11-07 | Warsaw | Russian Empire (Poland) | Poland | Female | NaN | NaN | NaN | POL | 25.00 |
| 29 | 1905 | Peace | The Nobel Peace Prize 1905 | NaN | 1/1 | Individual | Baroness Bertha Sophie Felicita von Suttner, n... | 1843-06-09 | Prague | Austrian Empire (Czech Republic) | Czech Republic | Female | NaN | NaN | NaN | CZE | 100.00 |
| 51 | 1909 | Literature | The Nobel Prize in Literature 1909 | "in appreciation of the lofty idealism, vivid ... | 1/1 | Individual | Selma Ottilia Lovisa Lagerlöf | 1858-11-20 | Mårbacka | Sweden | Sweden | Female | NaN | NaN | NaN | SWE | 100.00 |
df_data[df_data.full_name.duplicated(keep=False)]
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | pct_share | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 18 | 1903 | Physics | The Nobel Prize in Physics 1903 | "in recognition of the extraordinary services ... | 1/4 | Individual | Marie Curie, née Sklodowska | 1867-11-07 | Warsaw | Russian Empire (Poland) | Poland | Female | NaN | NaN | NaN | POL | 25.00 |
| 62 | 1911 | Chemistry | The Nobel Prize in Chemistry 1911 | "in recognition of her services to the advance... | 1/1 | Individual | Marie Curie, née Sklodowska | 1867-11-07 | Warsaw | Russian Empire (Poland) | Poland | Female | Sorbonne University | Paris | France | POL | 100.00 |
| 89 | 1917 | Peace | The Nobel Peace Prize 1917 | NaN | 1/1 | Organization | Comité international de la Croix Rouge (Intern... | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 100.00 |
| 215 | 1944 | Peace | The Nobel Peace Prize 1944 | NaN | 1/1 | Organization | Comité international de la Croix Rouge (Intern... | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 100.00 |
| 278 | 1954 | Chemistry | The Nobel Prize in Chemistry 1954 | "for his research into the nature of the chemi... | 1/1 | Individual | Linus Carl Pauling | 1901-02-28 | Portland, OR | United States of America | United States of America | Male | California Institute of Technology (Caltech) | Pasadena, CA | United States of America | USA | 100.00 |
| 283 | 1954 | Peace | The Nobel Peace Prize 1954 | NaN | 1/1 | Organization | Office of the United Nations High Commissioner... | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 100.00 |
| 297 | 1956 | Physics | The Nobel Prize in Physics 1956 | "for their researches on semiconductors and th... | 1/3 | Individual | John Bardeen | 1908-05-23 | Madison, WI | United States of America | United States of America | Male | University of Illinois | Urbana, IL | United States of America | USA | 33.33 |
| 306 | 1958 | Chemistry | The Nobel Prize in Chemistry 1958 | "for his work on the structure of proteins, es... | 1/1 | Individual | Frederick Sanger | 1918-08-13 | Rendcombe | United Kingdom | United Kingdom | Male | University of Cambridge | Cambridge | United Kingdom | GBR | 100.00 |
| 340 | 1962 | Peace | The Nobel Peace Prize 1962 | NaN | 1/1 | Individual | Linus Carl Pauling | 1901-02-28 | Portland, OR | United States of America | United States of America | Male | California Institute of Technology (Caltech) | Pasadena, CA | United States of America | USA | 100.00 |
| 348 | 1963 | Peace | The Nobel Peace Prize 1963 | NaN | 1/2 | Organization | Comité international de la Croix Rouge (Intern... | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 50.00 |
| 424 | 1972 | Physics | The Nobel Prize in Physics 1972 | "for their jointly developed theory of superco... | 1/3 | Individual | John Bardeen | 1908-05-23 | Madison, WI | United States of America | United States of America | Male | University of Illinois | Urbana, IL | United States of America | USA | 33.33 |
| 505 | 1980 | Chemistry | The Nobel Prize in Chemistry 1980 | "for their contributions concerning the determ... | 1/4 | Individual | Frederick Sanger | 1918-08-13 | Rendcombe | United Kingdom | United Kingdom | Male | MRC Laboratory of Molecular Biology | Cambridge | United Kingdom | GBR | 25.00 |
| 523 | 1981 | Peace | The Nobel Peace Prize 1981 | NaN | 1/1 | Organization | Office of the United Nations High Commissioner... | NaT | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 100.00 |
prize_category = df_data.category.value_counts()
prize_bar = px.bar(x=prize_category.index, y=prize_category,title="Number of Prizes per Category", color=prize_category,labels={"x":"Category","y":"Prizes"}, color_continuous_scale="Aggrnyl")
prize_bar.update_layout(coloraxis_showscale=False)
prize_bar.show()
df_data[df_data.category == "Economics"].head(1)
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | pct_share | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 393 | 1969 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for having developed and applied dynamic mode... | 1/2 | Individual | Jan Tinbergen | 1903-04-12 | the Hague | Netherlands | Netherlands | Male | The Netherlands School of Economics | Rotterdam | Netherlands | NLD | 50.00 |
df_data.sex.value_counts().reset_index()
| index | sex | |
|---|---|---|
| 0 | Male | 876 |
| 1 | Female | 58 |
cat_men_women = df_data.groupby(['category', 'sex'],
as_index=False).agg({'prize': pd.Series.count})
cat_men_women.sort_values('prize', ascending=False, inplace=True)
v_bar_split = px.bar(cat_men_women, x="category", y="prize", color="sex", title="Number of Prizes Awarded per Category Split by Men and Women")
v_bar_split.update_layout(xaxis_title='Nobel Prize Category',
yaxis_title='Number of Prizes')
v_bar_split.show()
year_data = df_data.groupby(['year'],
as_index=False).agg({'prize': pd.Series.count})
prize_average = year_data.rolling(on="year",window=5).mean()
plt.figure(figsize=(14,8), dpi=200)
plt.xticks(np.arange(1900, 2021, 5),fontsize=8)
plt.yticks(fontsize=14)
plt.title("Rolling Average of the Number of Prizes", fontsize=20)
plt.grid(color='gray',linestyle='dashed')
ax1 = plt.gca() # get current axes
ax1.set_xlabel('Year', fontsize=14)
ax1.set_ylabel("Prizes", fontsize=14)
average_scatter = plt.plot(prize_average.year, prize_average.prize, color="crimson",linewidth=3)
data_scatter = plt.scatter(x=year_data.year, y=year_data.prize, color="dodgerblue")
red_line = mlines.Line2D([], [], color='crimson', linestyle='-',
markersize=15, label='Rolling Average')
blue_patch = mpatches.Patch(color='dodgerblue', label='Data Points')
plt.legend(handles=[red_line, blue_patch],loc="lower right")
plt.show()
share_year = df_data.groupby(['year'],as_index=False).agg({'pct_share': pd.Series.mean})
rolling_share = share_year.rolling(on="year",window=5).mean()
plt.figure(figsize=(14,8), dpi=200)
plt.xticks(np.arange(1900, 2021, 5),fontsize=8)
plt.yticks(fontsize=14)
plt.title("Rolling Average Share of the Prizes vs Rolling Average prizes", fontsize=20)
plt.grid(color='gray',linestyle='dashed')
ax1 = plt.gca()
ax2 = ax1.twinx()
ax1.invert_yaxis()
ax1.set_xlabel('Year', fontsize=14)
ax2.set_ylabel("Prizes Average", fontsize=14, color="dodgerblue")
ax1.set_ylabel("Prizes Share", fontsize=14, color="crimson")
ax2.scatter(x=year_data.year, y=year_data.prize, color="pink")
ax1.plot(rolling_share.year, rolling_share.pct_share, color="crimson", linewidth=3)
ax2.plot(prize_average.year, prize_average.prize, color="dodgerblue", linewidth=3)
plt.show()
countries_nobel = df_data.groupby(['birth_country_current'],as_index=False).agg({'prize': pd.Series.count})
top20_countries = countries_nobel.sort_values(by="prize",ascending=False).head(20)
schema = px.bar(top20_countries, x=top20_countries.prize, y=top20_countries.birth_country_current, orientation='h',color=top20_countries.prize,
color_continuous_scale='Viridis')
schema.update_layout(yaxis=dict(autorange="reversed"),
xaxis_title='Prizes',
yaxis_title='Country',
title="Countries with the Most Nobel Prizes")
iso_data = df_data.groupby(['ISO'],as_index=False).agg({'prize': pd.Series.count})
iso_data.head(3)
| ISO | prize | |
|---|---|---|
| 0 | ARG | 4 |
| 1 | AUS | 10 |
| 2 | AUT | 18 |
px.choropleth(iso_data, locations="ISO",
color="prize", # lifeExp is a column of gapminder
hover_name="ISO", # column to add to hover information
color_continuous_scale="matter",
title="Number of Nobel Prizes by Country")
horizontal_country_data = df_data.groupby(['birth_country_current','category'],as_index=False).agg({'prize': pd.Series.count})
total_prize_per_country = horizontal_country_data.groupby("birth_country_current",as_index=False).agg({'prize': pd.Series.sum})
final_countries = pd.merge(horizontal_country_data,total_prize_per_country,on='birth_country_current')
final_countries.rename(columns={"prize_x": "cat_prize", "prize_y": "total_prize"},inplace=True)
countries_detail = px.bar(final_countries,y=final_countries.birth_country_current,x=final_countries.cat_prize, color="category", hover_name=final_countries.total_prize, orientation="h")
countries_detail.update_layout(yaxis=({'categoryorder':'total ascending'}),
xaxis_title='Number of Prizes',
yaxis_title='Country',
title="Nobel Prizes per Country split by Category")
countries_detail.show()
prize_year = df_data.groupby(['year','birth_country_current'],as_index=False).agg({'prize': pd.Series.count})
prize_year['cum_prizes']=prize_year.groupby(['birth_country_current'])['prize'].cumsum()
px.line(prize_year, x="year", y="cum_prizes", color="birth_country_current",
line_group="birth_country_current",
hover_name="birth_country_current",
labels = {"birth_country_current": "Birth Countries","year":"Year","cum_prizes":"Number of Prizes"},
title="Number of Prizes per Country Over Time")
organization_rank = df_data.organization_name.value_counts().head(20)
organization_sch = px.bar(organization_rank, x=organization_rank, y=organization_rank.index, orientation='h',color=organization_rank,
color_continuous_scale='Viridis')
organization_sch.update_layout(yaxis=dict(autorange="reversed"),
xaxis_title='Number of Prizes',
yaxis_title='Institution',
title="Top Research Organizations",
showlegend=False)
cities_top = df_data.organization_city.value_counts().head(20)
cities_sch = px.bar(cities_top, x=cities_top, y=cities_top.index, orientation='h',color=cities_top,
color_continuous_scale='Viridis')
cities_sch.update_layout(yaxis=dict(autorange="reversed"),
xaxis_title='Number of Prizes',
yaxis_title='City',
title="Cities that make the most Discoveries",
showlegend=False)
top20_birth = df_data.birth_city.value_counts().head(20)
births_sch = px.bar(top20_birth, x=top20_birth, y=top20_birth.index, orientation='h',color=top20_birth,
color_continuous_scale='Plasma')
births_sch.update_layout(yaxis=dict(autorange="reversed"),
xaxis_title='Number of Prizes',
yaxis_title='City',
title="Laureate Birth Cities",
showlegend=False)
gloglo = df_data.groupby(['organization_name',"organization_country","organization_city"],as_index=False).agg({'prize': pd.Series.count})
px.sunburst(gloglo, path=['organization_country','organization_city','organization_name'],
values=gloglo.prize,
title="Sunburst Chart with Country, City and Organization")
winning_age = df_data.year - df_data.birth_date.dt.year
df_data["winning_age"] = winning_age
df_data.sort_values(by="winning_age").head(3)
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | pct_share | winning_age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 885 | 2014 | Peace | The Nobel Peace Prize 2014 | "for their struggle against the suppression of... | 1/2 | Individual | Malala Yousafzai | 1997-07-12 | Mingora | Pakistan | Pakistan | Female | NaN | NaN | NaN | PAK | 50.00 | 17.00 |
| 85 | 1915 | Physics | The Nobel Prize in Physics 1915 | "for their services in the analysis of crystal... | 1/2 | Individual | William Lawrence Bragg | 1890-03-31 | Adelaide | Australia | Australia | Male | Victoria University | Manchester | United Kingdom | AUS | 50.00 | 25.00 |
| 932 | 2018 | Peace | The Nobel Peace Prize 2018 | “for their efforts to end the use of sexual vi... | 1/2 | Individual | Nadia Murad | 1993-07-02 | Kojo | Iraq | Iraq | Female | NaN | NaN | NaN | IRQ | 50.00 | 25.00 |
df_data.sort_values(by="winning_age",ascending=False).head(3)
| year | category | prize | motivation | prize_share | laureate_type | full_name | birth_date | birth_city | birth_country | birth_country_current | sex | organization_name | organization_city | organization_country | ISO | pct_share | winning_age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 937 | 2019 | Chemistry | The Nobel Prize in Chemistry 2019 | “for the development of lithium-ion batteries” | 1/3 | Individual | John Goodenough | 1922-07-25 | Jena | Germany | Germany | Male | University of Texas | Austin TX | United States of America | DEU | 33.33 | 97.00 |
| 933 | 2018 | Physics | The Nobel Prize in Physics 2018 | “for the optical tweezers and their applicatio... | 1/2 | Individual | Arthur Ashkin | 1922-09-02 | New York, NY | United States of America | United States of America | Male | Bell Laboratories | Holmdel, NJ | United States of America | USA | 50.00 | 96.00 |
| 794 | 2007 | Economics | The Sveriges Riksbank Prize in Economic Scienc... | "for having laid the foundations of mechanism ... | 1/3 | Individual | Leonid Hurwicz | 1917-08-21 | Moscow | Russia | Russia | Male | University of Minnesota | Minneapolis, MN | United States of America | RUS | 33.33 | 90.00 |
sns.set_style("darkgrid")
sns.set_context("talk")
sns.set(rc={'figure.figsize':(14,6)})
test = sns.histplot(df_data, x=df_data.winning_age,bins=15).set_title("Laureate Age at the time of the Award")
plt.xlabel("Laureate Age")
plt.show()
df_data["winning_age"].describe()
count 934.00 mean 59.95 std 12.62 min 17.00 25% 51.00 50% 60.00 75% 69.00 max 97.00 Name: winning_age, dtype: float64
sns.regplot(data=df_data,x=df_data.year,
y=df_data.winning_age,
lowess=True,
scatter_kws={'alpha':0.3},
line_kws={'color': 'crimson'}).set_title("Age at Time of Award throughout History")
plt.ylabel("Laureate Age")
plt.xlabel("Year")
plt.show()
sns.set_style("darkgrid")
sns.set_context("notebook")
sns.set(rc={'figure.figsize':(14,6)})
sns.boxplot(data=df_data,x="category", y="winning_age").set_title("Winning Age Across the Nobel Prize Categories")
plt.xlabel("Category")
plt.ylabel("Laureate Age")
plt.show()
sns.lmplot(data=df_data, x="year", y="winning_age", row="category", lowess=True)
plt.show()
with sns.axes_style("whitegrid"):
sns.lmplot(data=df_data,
x='year',
y='winning_age',
hue='category',
lowess=True,
aspect=2,
scatter_kws={'alpha': 0.5},
line_kws={'linewidth': 5})
plt.title("Winning Age Across the Nobel Prize Categories")
plt.ylabel("Laureate Age")
plt.xlabel("Year")
plt.show()